5 research outputs found

    Lexical measurements for information retrieval: a quantum approach

    Get PDF
    The problem of determining whether a document is about a loosely defined topic is at the core of text Information Retrieval (IR). An automatic IR system should be able to determine if a document is likely to convey information on a topic. In most cases, it has to do it solely based on measure- ments of the use of terms in the document (lexical measurements). In this work a novel scheme for measuring and representing lexical information from text documents is proposed. This scheme is inspired by the concept of ideal measurement as is described by Quantum Theory (QT). We apply it to Information Retrieval through formal analogies between text processing and physical measurements. The main contribution of this work is the development of a complete mathematical scheme to describe lexical measurements. These measurements encompass current ways of repre- senting text, but also completely new representation schemes for it. For example, this quantum-like representation includes logical features such as non-Boolean behaviour that has been suggested to be a fundamental issue when extracting information from natural language text. This scheme also provides a formal unification of logical, probabilistic and geometric approaches to the IR problem. From the concepts and structures in this scheme of lexical measurement, and using the principle of uncertain conditional, an “Aboutness Witness” is defined as a transformation that can detect docu- ments that are relevant to a query. Mathematical properties of the Aboutness Witness are described in detail and related to other concepts from Information Retrieval. A practical application of this concept is also developed for ad hoc retrieval tasks, and is evaluated with standard collections. Even though the introduction of the model instantiated here does not lead to substantial perfor- mance improvements, it is shown how it can be extended and improved, as well as how it can generate a whole range of radically new models and methodologies. This work opens a number of research possibilities both theoretical and experimental, like new representations for documents in Hilbert spaces or other forms, methodologies for term weighting to be used either within the proposed framework or independently, ways to extend existing methodologies, and a new range of operator-based methods for several tasks in IR

    A game theory approach for estimating reliability of crowdsourced relevance assessments

    Get PDF
    In this article, we propose an approach to improve quality in crowdsourcing (CS) tasks using Task Completion Time (TCT) as a source of information about the reliability of workers in a game-theoretical competitive scenario. Our approach is based on the hypothesis that some workers are more risk-inclined and tend to gamble with their use of time when put to compete with other workers. This hypothesis is supported by our previous simulation study. We test our approach with 35 topics from experiments on the TREC-8 collection being assessed as relevant or non-relevant by crowdsourced workers both in a competitive (referred to as "Game") and non-competitive (referred to as "Base") scenario. We find that competition changes the distributions of TCT, making them sensitive to the quality (i.e., wrong or right) and outcome (i.e., relevant or non-relevant) of the assessments. We also test an optimal function of TCT as weights in a weighted majority voting scheme. From probabilistic considerations, we derive a theoretical upper bound for the weighted majority performance of cohorts of 2, 3, 4, and 5 workers, which we use as a criterion to evaluate the performance of our weighting scheme. We find our approach achieves a remarkable performance, significantly closing the gap between the accuracy of the obtained relevance judgements and the upper bound. Since our approach takes advantage of TCT, which is an available quantity in any CS tasks, we believe it is cost-effective and, therefore, can be applied for quality assurance in crowdsourcing for micro-tasks

    A game theory approach for effective crowdsource based relevance assessment

    No full text
    Despite the ever-increasing popularity of crowdsourcing (CS) in both industry and academia, procedures that ensure quality in its results are still elusive. We hypothesise that a CS design based on game theory can persuade workers to perform their tasks as quickly as possible with the highest quality. In order to do so, in this article we propose a CS framework inspired by the n-person Chicken game. Our aim is to address the problem of CS quality without compromising on CS benefits such as low monetary cost and high task completion speed. With that goal in mind, we study the effects of knowledge updates as well as incentives for good workers to continue playing. We define a general task with the characteristics of relevance assessment as a case study, because it has been widely explored in the past with CS due to its potential cost and complexity. In order to investigate our hypotheses, we conduct a simulation where we study the effect of the proposed framework on data accuracy, task completion time, and total monetary rewards. Based on a game-theoretical analysis, we study how different types of individuals would behave under a particular game scenario. In particular, we simulate a population comprised of different types of workers with varying ability to formulate optimal strategies and learn from their experiences. A simulation of the proposed framework produced results that support our hypothesis
    corecore